There is a lack of a systematic comparison framework that can assess models in both single- step and multi-step forecasting situations while balancing accuracy, training efficiency, and prediction horizon. This study aims to evaluate the predictive capabilities of machine learning and deep learning models in water quality time series forecasting. It made use of 22-month data with a 4 h interval from two monitoring stations located in a tributary of the Pearl River. Six models, specifically Support Vector Regression (SVR), XGBoost, K-Nearest Neighbors (KNN), Recurrent Neural Network (RNN), Long Short- Term Memory (LSTM) Network, Gated Recurrent Unit (GRU), and PatchTST, were employed in this study. In single-step forecasting, LSTM Network achieved superior accuracy for a univariate feature set and attained an overall 22.0% (Welch’s t-test, p = 3.03 × 10−7) reduction in Mean Squared Error (MSE) compared with the machine learning models (SVR, XGBoost, KNN), while RNN demonstrated significantly reduced training time. For a multivariate feature set, the deep learning models exhibited comparable accuracy but with no model achieving a significant increase in accuracy compared to the univariate scenario. The KNN model underperformed across error evaluation metrics, with the lowest accuracy, and the XGBoost model exhibited the highest computational complexity. In multi-step forecasting, the direct multi-step PatchTST model outperformed the iterated multi-step models (RNN, LSTM, GRU), with a reduced time-delay effect and a slower decrease in accuracy with increasing prediction length, but it still required specific adjustments to be better suited for the task of river water quality time series forecasting. The findings provide actionable guidelines for model selection, balancing predictive accuracy, training efficiency, and forecasting horizon requirements in environmental time series analysis.
Loading....